


Lesson: Parsing Data for Data Cleanse 


e Names associated with nicknames (for example, Elizabeth for Liz, Beth, or Betsy) 


This example shows how data cleansing can prepare records for matching. 


Parse data 


Input data 


Mr. Dan R. Smith, Jr., CPA 
Account Mgr. 

Jones Inc. 

Dept. of Accounting 

PO Box 567 

Biron, WI 54494 


Input data 


James Witt 
421-55-2424 
jwitt@rdrindustries.com 
507-555-3423 

Aug 20, 2003 


Parsed data 


Prename 
First Name 
Middle Name 
Last Name 
Maturity 
Other Postname 
Title 

Firm 

Firm Location 
Extra 

Extra 


Parsed data 


First Name 
Last Name 
Social Security 
E-mail address 
Phone 

Date 


Smith 

Postname Jr. 

CPA 

Account Mor. 
Jones Inc. 

Dept. of Accounting 
PO Box 567 

Biron, WI 54494 


James 

Witt 

421-55-2424 
jwitt@rdrindustries.com 
507-555-3423 

Aug 20, 2003 





A Figure 19: Business Need for Data Cleanse Transform 1 


e Identifying and isolating a wide variety of data, even if the data is floating in lines. 


e Standardizing data to make records more consistent, such as fixing casing, punctuation, 


and abbreviations. 


e Assigning a precise gender code to each name — strong male, strong female, weak male, 


weak female, and ambiguous. 


e Assigning a prename such as Mr., Ms., or Mrs. based on gender codes. 


e Creating personalized greetings in formal, casual, and title styles: Dear Mr. Jones, Dear 
Robert, and Dead Manager. The transform creates a greeting for each person, as wellas a 
dual name greeting for records with two names. 


e Creating a separate output record for each person in a record with multiple persons. For 
example, an input database can contain one record for each customer with multiple 
contact persons, each of which can be split into separate records. 


Finalized data cleanse transform 
e Standardize data 
e Assign gender and prenames 


e Create personalized greetings 


e Create separate data for each person 
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